Gene expression analysis of colon high-grade dysplasia revealed new molecular mechanism of disease.

Aim
The aim of this research was to find a clear molecular view of dysplasia via network analysis.


Background
There are some evidence suggest the relationship between dysplasia and colorectal cancer. Understanding of high-grade dysplasia (HGD) could be beneficial for colon cancer management.


Methods
Bioinformatics study of HGD versus healthy subjects was conducted to check the status of differentially expressed genes (DEGs). GSE31106, GPL1261, GSM770092-94 and GSM770101-6 were the sources from gene expression omnibus (GEO) that queried for protein-protein interaction (PPI) network analysis via Cytoscape and its algorithms. Hubs of network were enriched for biochemical pathways and were validated via clustering analysis.


Results
Numbers of 46 hub nodes were determined and were included in 12 pathways. A main cluster including 76 nodes was identified containing 45 hubs. 33 hubs among 46 genes were involved in biochemical pathways. IL1B, IL6, TNF, and TRL4 were the most important critical genes.


Conclusion
Many different genes as hub nodes might influence the trigger and development of advance condition and also colon cancer.


Introduction
1 Colon cancer accounts for the third cause of cancer death in the world (1). A need for early detection and management of this malignant tumor is much sensed. In this regard, pre-modifications known as precancerous conditions could be important to study for introduction of molecular targets (1,2). High-grade dysplasia RAS have a link with this transition as the normal tissue develops precancerous and eventually cancerous stages (4). Therefore, patients with dysplastic lesions are important to be investigate d for follow-up and screening in case of cancer developments. What is more about these molecular targets at the precancerous conditions, is that to find the most accurate and specific ones (2). These agents could be investigated for topological features in a whole interaction map. In this way, more support and validation can be achieved. These maps are present in organisms as a proteinprotein interaction network via physical correlations (5). Functions carried out in a cell is due to the normal interactions of these molecules. When it comes to the interaction pattern of molecules such as genes and proteins, normal expression and their functions are the key. Changes in expression of any of these genes, microRNAs, and proteins could result in abnormal interactions (6). The worst part is when these genes are with high centrality and are encountered with expression changes. Furthermore, gene ontology analysis of the central elements could provide more information about the disease underlying mechanisms (7). It can show which related biological process to these central agents could be disrupted and therefore worth more evaluations. To reach this goal, there are algorithms available that could analyze the genes corresponding terms. In this research, protein-protein interaction network of differentially expressed genes of HGD in comparison with normal condition is analyzed to introduce the most promising ones for clinical applications.

Data collection
Microarray Data (GSE31106, GPL1261, GSM770092-94 and GSM770101-6) were considered to compare Gene expression profiles of normal group and treated five-week-old male mice with intraperitoneal injected with 10mg/kg Azoxymethane. The samples three cycles with Dextran sulfate sodium (2%, 1.5%, and 1.5%) were treated while controls were treated with salin injection and drinking distilled water. The extracted RNAs from colorectal tissue after 6 weeks of treatment were analyzed by Affymerix GeneChip Mouse Genome 430 2.0 Array.
Characterized DEGs with fold change (FC) less than 0.5 and more than 2 among the 250 top significant DEGs were selected to be examination via PPI network analysis.

PPI network analysis
STRING database (8) and Cytoscape software (9) version 6.3.2 were applied to construct PPI network. The network was evaluated by Network Analyzer application of Cytoscape and the hubs were selected based on degree cutoff (mean+2SD) (10). Related biological processes to the hub nodes were identified by ClueGO v 2.5.0 plugin of Cytoscape software from KEGG 20.11.2017.

Statistical analysis
Gene expression profiles were matched via boxplot analysis and p-value <0.05 was considered for significant findings. Biological processes were identified based on K-score, at least 10 genes/Term, and 10% participation of genes per term.

Results
Gene expression profiles of 6 HGD samples and 3 controls were matched via box plot analysis. As it is shown in the figure 1, the samples are comparable. It is appeared that 50% of genes are characterized in high levels of expression in both HGD and normal control samples. Numbers of 250 top significant DEGs based on p-value criterion were selected. Among 250 selected DEGs numbers of 24 individual were not characterized which excluded for more analysis. Numbers of 129 DEGs amongst 226 significant and characterized DEGs were included in PPI network analysis based on fold change less than 0.5 and above 2. STRING database recognized 98 DEGs and network was constructed via these ones and 50 added relevant genes. The network was included 34 isolated nodes and a main connected component. This component which we call it as network of HGD, contains 114 nodes and 1451 edges. Centrality analysis leds to introduce 46 hub nodes that are tabulated in the table 1. Betweenness centrality (BC) and closeness centrality (11)

Discussion
HGD is important to study for colorectal cancer management. This precancerous stage could be essential for identification of factor triggering tumorigenesis. One way to reach this purpose is through molecular research. Data mining; in addition, could assist adding more information and values related to the identified molecular agents corresponding to any conditions (12). Proteinprotein interaction network is one of assessment biomarkers in terms of centrality role in an interaction network. In this network analysis approach, we identify central of differentially expressed genes network in HGD via associated methods and algorithms. To do this, at first, the quality of expression profile of samples of healthy and dysplasia groups were compared in figure 1. The analysis shows that the data is suitable for comparison as the samples are median-centered. GEO2R identified genes with modified expressions and these genes were queried in Cytoscape for a network construction. The network centrality analysis introduced 46 hub genes that almost none were from DEGs of HGD. Moreover, these genes are very close in degree values and could be very important in the network integrity. Among 114 nodes, 46 individuals were identified as hubs. On the other hand, about 40% of nodes are hubs. What is more, these genes are divided in four categories consisted of immune related genes (such as ILs), oncogenes (as like AKT1, JUN, and SRC), metabolism related genes (especially INS), and other types of genes (13-15).  Figure 3. Protein cluster analysis of hub nodes via ClusterOne application is shown. The cluster was elected based on at least ten genes per cluster.
Oncogenes including AKT1, JUN, SRC are a gene set that are also prominent in colon cancer (16). The other genes belong to metabolic pathways are GAPDH, INS, IGF1 which play significant role in proliferation and apoptosis in colon cancer (11,17). Functional categorization of hub genes indicated 12 associated pathways in figure 2 that except one of the pathway, HIf-1 signaling pathway which highlighted in green color, other terms are presented in the same group. Distribution of hub genes in the related pathways was analyzed in table 2. The findings indicate that IL1B and IL6 are mostly involved in the all biological terms similarly TNF and TRL4 as next rank participate in 10 biological terms. These elements of these two rows are all linked to immune system category. Furthermore, clustering analysis leds to introduce one significant cluster which contains approximately all the hubs except for NOS3. This clustering could validate the importance of these identified hubs in the HGD network. To get a better understanding, a literature review of the hub genes that are present in the most pathways (the first two rows) as well as the first two top ones among 46 genes is conducted for possible relationship with colon cancer. GAPDH, glyceraldehyde-3-phosphate dehydrogenase as a pleiotropic enzyme (18) has the highest degree, betweenness, and closeness amounts in this network and it is active in apoptosis (19). This gene has been referred with highlighted moonlighting effect in cancer development (20). It shows it is a possible key role in transition from dysplasia to cancer states. In addition, it's up-regulation has been reported for colorectal cancer (17). IL 6 the next gene that is very important in cancer, its increment has been also associated with colorectal cancer progression (21). The higher the level of IL 6 in human serum, the more developed the tumor (22). This gene is also ranked as the first group in table 2. In this grouping, IL1B as another inflammatory system gene that is famous in gastrointestinal system and promotes invasion in colorectal tumor as well (23). TNF in the next group, high singling levels could be important in colon cancer (13). TRL4 is also reported for colon metastasis. In fact, multiple roles has been identified for this gene (24).
In this research, it was confirmed that advance dysplasia is accompanied with vast alterations in gene expression algorithm of human body. In this regard immune system, metabolic pathways, and oncogenes are affected. In addition, deregulation of immune system and inflammation is prominent in HGD. This complexe condition in HGD may led to onset of colorectal cancer (25). A further comprehensive knowledge of colorectal cancer and its prediction are interpreted by identification of crucial genes, which are involved in HGD (26,27). This set of possible biomarkers and the related biological processes may play critical roles in transition between HGD and colon cancer. However, the exact participation of these genes will require more in-depth research for clinical setting.